Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
نویسندگان
چکیده
Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the fact that both DOP and discontinuity present formidable challenges in terms of computational complexity, the model is reasonably efficient, and surpasses the state of the art in discontinuous parsing.
منابع مشابه
Discontinuous Data-Oriented Parsing through Mild Context-Sensitivity
It has long been argued that incorporating a notion of discontinuity in phrase-structure is desirable, given phenomena such as topicalization and extraposition, and particular features of languages such as cross-serial dependencies in Dutch and the German Mittelfeld. Up until recently this was mainly a theoretical topic, but advances in parsing technology have made treebank parsing with discont...
متن کاملRich Statistical Parsing and Literary Language
This thesisapplies the Data-Oriented Parsing framework in two areas:parsing & literature. The data-oriented approach rests on the assumptionthat re-use of chunks of training data can be detected and exploited attest time. Syntactic tree fragments form the common thread in the thesis.Chapter 2 presents a method to efficiently extract them from treebanks,based on heuristic...
متن کاملDiscontinuous Parsing with an Efficient and Accurate DOP Model
We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We g...
متن کاملDiscontinuity and Non-Projectivity: Using Mildly Context-Sensitive Formalisms for Data-Driven Parsing
We present a parser for probabilistic Linear Context-Free Rewriting Systems and use it for constituency and dependency treebank parsing. The choice of LCFRS, a formalism with an extended domain of locality, enables us to model discontinuous constituents and non-projective dependencies in a straightforward way. The parsing results show that, firstly, our parser is efficient enough to be used for...
متن کاملPolynomial Pregroup Grammars parse Context Sensitive Languages
Pregroup grammars with a possibly infinite number of lexical entries are polynomial if the length of type assignments for sentences is a polynomial in the number of words. Polynomial pregroup grammars are shown to generate the standard mildly context sensitive formal languages as well as some context sensitive natural language fragments of Dutch, SwissGerman or Old Georgian. A polynomial recogn...
متن کامل